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Abstract 

Quantitative studies in many fields involve the analysis of multivariate data of diverse types, 
including measurements that we may consider binary, ordinal and continuous. One approach to 
the analysis of such mixed data is to use a copula model, in which the associations among the 
variables are parameterized separately from their univariate marginal distributions. The purpose 
of this article is to provide a simple, general method of semiparametric inference for copula 
models via a type of rank likelihood function for the association parameters. The proposed 
method of inference can be viewed as a generalization of marginal likelihood estimation, in 
which inference for a parameter of interest is based on a summary statistic whose sampling 
distribution is not a function of any nuisance parameters. In the context of copula estimation, 
the extended rank likelihood is a function of the association parameters only and its applicability 
does not depend on any assumptions about the marginal distributions of the data, thus making 
it appropriate for the analysis of mixed continuous and discrete data with arbitrary marginal 
distributions. Estimation and inference for parameters of the Gaussian copula are available via 
a straightforward Markov chain Monte Carlo algorithm based on Gibbs sampling. Specification 
of prior distributions or a parametric form for the univariate marginal distributions of the data 
is not necessary. 

Some key words: Bayesian inference, latent variable model, marginal likelihood, Markov chain 
Monte Carlo, multivariate estimation, polychoric correlation, rank likelihood, sufficiency. 

1 Introduction 

Studies involving multivariate data often include measurements of diverse types. For example, a 
survey or observational study may record the sex, education level and income of its participants, 
thus including measurements that we may consider binary, ordinal and continuous. Such studies 
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1 



o 

<N _. 
o - 

o 



o 
o - 

d 



20 



~ I 1 

40 60 
INC 



80 



d 



d 



en 
d 



d 



c 
d 



100 



~~ 1 1 1 

2 3 4 
DEC 



o 
d 



o 

O — I 

d 



1 



D 



t — i — i — i — r 

2 3 4 5 6 
CHILD 



d 



d 



d 



o 
d 



d 



2 3 
PINC 



o 
d 



□ 



I 1 1 1 1 

12 3 4 
PDEG 



d 



o 
d 



o — 

d 



o 

O — I 

d 



~~ i r~ 

10 15 
PCHILD 



— 1 

20 



Figure 1: Univariate histograms of the GSS data. 



are generally concerned with statistical associations among the variables, but not necessarily the 
scale on which the variables are measured. One approach to data analysis in these situations is to 
obtain rank-based measures of bivariate association, such as the rank correlation or "Spearman's 
rho". Such procedures are scale- free, but involve ad-hoc methods for dealing with ties and pro- 
vide inference that is generally limited to hypothesis tests of bivariate association. These issues 
make such procedures problematic for the analysis of much of social science survey data, in which 
the variables are often discrete and the hypotheses of interest generally concern multivariate and 
conditional associations. For example, Figure Q] shows histograms of six demographic variables of 
male respondents to the 1994 General Social Survey. The variables INC, DEG and CHILD refer 
to the income, highest degree and number of children of a survey respondent, and PINC, PDEG 
and PCHILD refer to similar variables of the respondent's parents (further details on the variables 
are given in Section 4). All of these variables are ordered categorical variables, even though some 
of them have many levels. Additionally, our interests in these variables involve measures of con- 
ditional association: An assessment of the relationship between income and number of children 
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Response 


INC 


CHILD 


DEG 


Predictor 
AGE 


PCHILD 


PINC 


PDEG 


INC 


NA 


1.10 (.11) 


7.03 (<.01) 


.34 (<.01) 


4.07 ( <.01) 


.28 (.41 ) 


1.40 ( .12) 


CHILD 


.01 (.01) 


NA 


-.07 (.06) 


.04 (<.01) 


-.06 (.20) 


.02 (.08) 


-.05 (.20) 



Table 1: Estimation linear and Poisson regression coefficients in the conditional models for INC 
and CHILD, with p-values in parentheses. 

would generally be considered incomplete if it failed to account for heterogeneity of the survey 
respondents in terms of their age, parental income and other variables. 

The standard approach to making conditional assessments of statistical association is the use 
of regression models. For example, to describe the conditional association between income and 
number of children we could estimate the parameters in a regression model of the following form: 

INCi = p + /3iCHILDi + /3 2 DEGi + /3 3 AGEj + /^PCHILD; + /3 5 PINQ + /3 6 PDEGj + e { (1) 

Least-squares parameter estimates for this model, along with normal-theory p- values appear in the 
first row of Table [TJ Standard practice is to interpret the p-value of 0.11 for CHILD as suggesting 
that there is not substantial evidence against 0i = 0, in which case the model implies that INC 
and CHILD are conditionally independent given the other variables. Alternatively, we could have 
evaluated the same conditional independence hypothesis with a regression model for CHILD. As 
this is a count variable, we might use a Poisson regression model: 

CHILD; ~ Pois(exp{/3 + + /3 2 DEG; + /3 3 AGE; + /3 4 PCHILD; + ftPINQ + /? 6 PDEG;}) (2) 

Maximum likelihood estimates and p- values for this model appear in the second row of Table [TJ In 
contrast to the results of Model (UJ), these results indicate reasonably strong evidence (p = 0.01) 
that CHILD and INC are not conditionally independent, given the other variables. 

The contradiction between the above two analyses is partly due to the inadequacies of the 
simple univariate parametric Gaussian and Poisson models. However, in general there is no reason 
to expect that two separately estimated conditional models will give compatible results: Given 
two conditional models fi(yi\y2,x) and ^(^lyi, x), om y under very specific conditions does there 
exist a joint probability d istribution p(j/i , j/ 2 |x) having fi and / 2 as its full conditional distributions 



Arnold and Press 



1989]. This presents a problem for the analysis of multivariate data of diverse 
types: in the absence of an appropriate multivariate model, common practice is to analyze the data 
via one or more univariate regression models, choosing the "response" from the variables which 
might best fit an ordinary or generalized linear regression model. However, as the above example 
shows, different choices about which variables to treat as the response can lead to incompatible 
models with different conclusions. 
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Part of the above problem can be resolved by jointly modeling the variables of interest. A 
number of latent-variable methods have been recently developed to accommodate non-Gaussian 
multivariate data. These methods generally proceed by modeling each component of a vector of 
observations with a parametric exponential family model, i n which the parameters for ea ch compo- 



nent involve an unobserved latent variable. For example, iChib and Winkelmannl [20011 ] present a 
model for a vector of correlated count data in which each component is a Poisson random variable 
with a mean depending on a component-specific latent variable. Dependence among the count vari- 
ables is induced by modeling the ve ctor of la t ent v ariables with a multivariate nor mal distribution . 
Similar approaches are proposed by iDunsonl [20001 ] and described in Chapter 8 of ICongdonl [2003 ]. 



The model of Chib and Winkelmann can be viewed as a copula model, in which the association 
parameters are modeled separately from the marginal distributions of the observed data. Such 
a modeling approach can be applied to a wide variety of multivariate analysis problems: An old 
mathematical result known as Sklar's Theorem says that every multivariate probability distribution 
can be represented by its univariate marginal distributions and a copula, which is a type of joint 
dis tribution w i th fix ed marginals. 



Pitt et al. 



20061 ] develop an estimation procedure for multivariate normal copula models in 



which the marginal distributions belong to specified parametric families. Unfortunately, the marginal 
distributions of survey data such as age, number of children, income and education level generally 
do not belong to standard families. For such data a semiparametric estimation strategy may be 
appropriate, in which the associations among the variables are represented with a simple parametric 
model but the marginal distributions are es timated nonparametrically. In the case where all the 
variables are continuous. iGenest et al.l [19951 ] suggest a "pseudo-likelihood" approach to estimation, 
in which the observed data is transformed via the empirical mar ginal distributions to obtain p seudo- 



data that can be used to estimate the association parameters. iKlaassen and Wellnerl [19971 ] study 
a similar type of estimation in the case of the Gaussian copula. Such estimators are well-behaved 
for continuous data but can fail for discrete data, making them somewhat inappropriate for the 
analysis of mixed continuous and discrete data. For ordinal discrete data with a known number of 
categor i es, th e dependence induced by the Gaussian copula model is called polychoric correlation. 



Olsson 



19791 ] describes a two-stage estimation procedure for the parameters in the copula, and 



this and other estimation strategies appe ar in a number of so ftware packages including SAS PROC 
FREQ and the LISREL module PRELIS. iKottas et al.l 2005f | describe a nonparametric estimation 



procedure in which the copula is based on a mixture of normal distributions. However, such pro- 
cedures do not accommodate continuous data, and may even be problematic for discrete data with 
a large number of categories, as inference in this case requires the simultaneous estimation of the 
large number of parameters specifying the marginal distributions. 

As an alternative to these procedures, this article presents an approach to copula estimation in 
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which the marginal distributions are arbitrary and of unspecified types, thus accommodating both 
discrete and continuous data. This is achieved by the use of a likelihood function that depends on 
the association parameters only, and does not make assumptions about the form of the univariate 
marginal distributions. Inference based on such a likelihood is therefore appropriate for the joint 
analysis of continuous and ordinal discrete data. For continuous data, the likelihood function we 
propose is derived from the m arginal probability of the ranks, an d can be seen as a multivariate 
version of a "rank likelihood" Pettittl . Il982l . iHeller and Qinl . 120011 ] which does not depend on the 



univariate marginal distributions. Unfortunately, for discrete data the probability of the observed 
ranks is not free of these nuisance parameters. To solve this problem, we derive a likelihood that 
is equivalent to the distribution of the ranks for continuous data but is also free of the nuisance 
parameters for discrete data. This likelihood function is derived from the probability that the latent 
variables of the copula model satisfy the partial ordering induced by the observed data. We call 
this function an extended rank likelihood, as it generalizes the concept of rank likelihood. This 
likelihood can also be seen as a generalization of a marginal likelihood, which is based on a statistic 
whose sampling distribution depends only on the parameter of interest and not on any nuisance 
parameters. 

In what follows we work with the Gaussian copula model, although the basic ideas can be 
extended to other parametric families of copulas. In the next section we review the general Gaussian 
copula model, and discuss how inference for discrete data using existing semiparametric methods is 
problematic. Section 3 derives the extended rank likelihood as a general approach to semiparametric 
copula estimation and discusses parameter estimation in the context of Bayesian inference using a 
relatively simple Gibbs sampling scheme. 

The primary goal of this paper is to provide a simple method of inference for the multivariate 
relationships between variables, such as INC, CHILD, DEG described above, whose univariate 
marginal distributions cannot be well approximated with simple parametric models. In Section 
4 we present an analysis of these and other demographic characteristics of males in the 1994 
U.S. workforce and their parents. In particular, we are interested in the statistical associations 
among income, education and number of children of the survey respondents, and how they relate 
to similar characteristics of the parents of the survey respondents. The data come from the 1994 
General Social Survey, and include a number of discrete and non-Gaussian random variables. In 
addition to estimating a Gaussian copula model for these data, we estimate and describe the 
conditional dependencies among the variables on the Gaussian scale, as well as provide predictive 
and conditional distributions on the original scale of the data. 

Section 5 considers notions of statistical sufficiency relevant to the rank likelihood, and a dis- 
cussion follows in Section 6. 
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2 Semiparametric copula estimation 



Let yi and 1/2 be two random variables with continuous CDF's Fx and F2. The transformed 
variables u\ = F\(yi) and U2 = ^2(2/2) both have uniform marginal distributions. The term 
"copula modeling" generally refers to a model that parametrizes the joint distribution of u\ and 112 
separately from the marginal distributions F\ and F%. A semiparametric copula model includes a 
parametric model for the joint distribution of u\ and U2, but lacks any parametric restrictions on 
Fx or F 2 . 

Any continuous multivariate distribution can be used to form a copula model via an inverse-CDF 
transformation. For example, the bivariate normal distribution can be used to generate dependent 
data with arbitrary marginals Fx and F2 as follows: 





1. sample 




bivariate normal 




2. set yx = F^Mzx)], y 2 = i^PM. 

where -F _1 (u) = inf{y : F(y) > u} denotes the pseudo-inverse of a CDF F. The correspondence to 
the usual copula formulation can be seen by noting that = u is uniformly distributed. 

Suppose (2/1,1,2/1,2); ••• j (2/n,i) 2/71,2) are samples from a population that we wish to model with 
a Gaussian copula. If the marginal distributions Fx and F2 were continuous and known, then the 
values Zij = $~ 1 [Fj(yij)] could be treated as observed data and p could be estimated directly from 
the z's, perhaps using the unbiased estimator p = \ Y17=i z i,l z i,2- Of course, the marginal CDF's 
are not typically known. One semiparametric estimation strategy is to plug-in the the empirical 
CDF's Fx and F2 to obtain pseudo-data Zij = ^~ 1 [^r[Fj(yij)] = $~ 1 [Fj(yij)], where the rescaling 
is to avoid infinities. For continuous data, the estimator p = ~ Yl7=i ^,1^,2 is asymptotically 
equivalent to the asymptotically efficient Van de r Waer den normal-scores rank correlation coefficient 



Hajek and Sidak 



1967 



Klaassen and Wellner. 



19971 ] . This estimator is similar to one obtained 



Tom a more general pseudo-likelihood estimation procedure described and studied by 



Genest et al 



19951 ] . In the context of the Gaussian copula model, the maximum pseudo-likelihood procedure is 



to 

1. set Zij = $- 1 [F i (?/i, i )]; 

2. maximize in p the pseudo-log-likelihood Ya=1 l°&kvn(5i,i, Zi,2\p), 

where bvn(-|p) denotes the bivariate normal density with standard normal marginals. Genest et 
al. show that the resulting pseudo-likelihood estimator is consistent and asymptotically normal 
under the condition that Fx and F2 are continuous. However, this condition calls into question the 
appropriateness of the pseudo-likelihood approach for non-continuous data such as sex, education 
level, age or any other type of data where there are likely to be ties. 
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What could go wrong with such an estimator in situations involving discrete data? In general, 
these pseudo-data estimators of copula parameters will be problematic for discrete data because 
transformations of such data do not really change the data distribution, they just change the 
sample space. Consider the simple case of a continuous variable y\ and a binary variable 1/2 such 
that Pr(j/2 = 0) = Pr(j/2 = 1) = 1/2. Letting Zij = $~ 1 [Fj(yi : j)], the distribution of 2:1,1, • • • , 
will have an approximately standard normal distribution, but Zip will be approximately equal to 
either $ -1 (iriTi) or with probability one-half each. If the Gaussian copula model is 

correct, then one can show that the expectation of p is roughly — ^= t I>~ 1 (^pj ) . As n increases so 
does the expectation of p, and it is not a consistent estimator. One problem here is that all of the 
Zi^s such that yip = 1 are being pushed to the extreme standard normal quantile (^jt^), which 
in the case of continuous data would happen just to a single datapoint. The situation is only partly 
improved by using the sample correlation of the pseudo-data as an estimator: The variance of z\ 
is approximately 1 and the variance of Z2 is approximately [|$ -1 (;j?pi)] 2 > giving an approximate 
sample correlation of Cor (2:^1, Zip) ~ p^2/ir. 

3 Estimation using the extended rank likelihood 

In this section we derive a likelihood function that depends on the association parameters and 
not on the unknown marginal distributions. For continuous data this function is equivalent to 
the distribution of the multivariate ranks. This is not the case of discrete data, for which the 
distribution of the ranks depends on the univariate marginal distributions. In this case the derived 
likelihood function contains less total information than one based on the ranks, but it is free of any 
parameters describing the marginal distributions. 

3.1 Extended rank likelihood 

Generalizing from the previous section, the Gaussian copula sampling model can be expressed as 
follows: 

zi,...,z n [C ~ i.i.d. multivariate normal(0, C), (3) 

vij = F-'mzij)}, 

where C is a p x p correlation matrix and each F^ 1 denotes the (pseudo) inverse of an unknown 
univariate CDF, not necessarily continuous. 

Our goal is to make inference on C, and not on the potentially high-dimensional parameters 
Fi, . . . ,F p . If the z's were observed we could use them to directly estimate C. The z's are not 
observed of course, but the y's do provide a limited amount of information about them, even absent 
any knowledge of the F's: Since the F's are non-decreasing, observing y^j < yi 2 j implies that 
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Zi lt j < z,i 2 j. More generally, observing Y = (y l5 . . . , y n ) T tells us that Z = (zi, . . . ,z n ) T must lie 
in the set 

{Z G R nxp : max{zk,j : Vkj < Vi,j} < < mm{z k j : y it j < ykj}}- 

We can take the occurrence of this event as our data. Letting D be the fixed subset of M nxp 
generated by the observed value of Y, we can calculate the following "likelihood" : 

Pr(Z eD\C,Fi,...,F p ) = [ p(Z|C) dZ = Pr(Z G D\C). (4) 

Jd 

As a function of the parameters, this likelihood depends only on the parameter of interest C and 
not the nuisance parameters F\, . . . , F p . Estimation of C can proceed by maximizing Pr(Z G D\C) 
as a function of C, or by obtaining a posterior distribution Pr(C|Z G D) oc p(C) x Pr(Z G D\C). 

The likelihood function @ can be seen as a type of marginal likelihood function for estimation in 
the presence of a nuisance parameter: Consider a generic statistical problem in which the density 
for data y depends on a parameter of interest 6 and a nuisance parameter ip. If there exists a 
statistic t(y) whose distribution depends on 9 only, then the density of y may be decomposed as 

= P{t{y)\0) xp{y\t(y),e,4>). 
In this situation, estimation of 9 can be based on the marginal likelihood p(t (y)\0), eliminat ing the 



need to estimate the nuisance parameter tp (see, for example, Section 8.3 of ISeverinil 20001 ] ). The 
likelihood function Pr(Z E D\C) in our copula estimation problem can be derived analogously, by 
decomposing the probability of the observed data as 

p(Y|C,F 1 ,...,F p ) = p(ZeD,Y\C,F h ...,F p ) (5) 
= Pr(Z G -D|C) x p(Y|Z G D,C,F 1: . . . ,F p ). (6) 

Equation ([5]) holds because the event Z G D occurs whenever Y is observed. This derivation 
can be made rigorous by deriving the density p(Y\C, F\, . . . , F p ) from the limit of Pi(r\i t j(yij — 
e, j/jj] |C, F\, ... , F p ) as e — > 0. As in the case of marginal likelihood, our approach is to estimate C 
using only Pr(Z G D\C), the part of the observed data likelihood © that depends on the parameter 
of interest C and not on the nuisance parameters iq, ... , F p . Since our likelihood function is based 
on the marginal probability of an event that is a superset of observing the ranks, we refer to it as 
an extended rank likelihood. 



3.2 Estimation of the copula parameters 

Bayesian inference for C can be achieved via construction of a Markov chain having a stationary 
distribution equal to p(C|Z G D) oc p(C) x p(Z G D |C). In the case of the Gaussian cop- 
ula with a semi-conjugate prior distribution, the Markov chain can be constructed quite easily 



8 



using Gibbs sampling. This prior distribution for C is defined as follows: Let V have an inverse- 
Wishart(z/o, ^Vo) prior distribution, parameterized so that E'fV -1 ] = Vq 1 , and let C be equal in 
distribution to the the correlation matrix with entries Using this prior distri- 

bution, approximate samples from p(C|Z G D) can be obtained by iterating the following Gibbs 
sampling scheme: 

Resample Z. Iteratively over sample Zij from p(zij\V, Z[_j _j], Z G D) as follows: 

For each j G {1, . . . ,p} 

For each y G unique^ij, . . . , y n j} 

1. Compute zi = max{z itj : y id < y} and z u = mm{z itj : y < y itj } 

2. For each i such that yij = y, 

(a) compute a) = \ [hj] - V^^V-^^V,^^ 

(b) compute m tj = Z^^Vy^V^^^ 

(c) Sample u id uniformly from <Z>[ z \^' j ]) 

(d) Set z^j = m j + Uj x ^^(uij) 

Resample V. Sample V from an inverse- Wishart(Vo + n, uqVq + Z T Z) distribution. 
Compute C. Let C [id] = V {i tj] / 

Iteration of this algorithm generates a Markov chain in C whose stationary distribution is p(C\Z G 
D). This algorithm is easily modified to accommodate data that are missing-at-random: If is 
missing, the full conditional distribution of Zij is the unconstrained normal distribution with mean 
fiij and variance <j| given above. 

The reader may have noticed that the samples of Z are based on the covariance matrix V and 
not the correlation matrix C. To see why this does not matter for estimation of C, compare our 
original model, 

V ~ inverse- Wishart(^o, ^0^0 ) 

{ c [m']} = { v [m]/ V V M V [iJ']} 
zi, . . . , z„ ~ i.i.d. multivariate normal(0, C) 

to the equivalent model 

V ~ inverse-Wishart(^o, ^0^0) 
z±, . . . ,z n ~ i.i.d. multivariate normal(0, V) 

Zi,j = Zij/ an d let C = Cov(z) 

Ui,j = Gj(Zij). 
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The z's in the first formulation are equal in distribution to the z's in the second, and so posterior 
inference for C is equivalent under either model. The Gibbs sampling scheme outlined above is 
based on a Markov chain in V and zi, . . . ,z n based on the second formulation. Note that in this 
formulation the observed data implies the same ordering D on both the z's and the z's. Additionally, 
posterior estimation of C is invariant to changes in the prior distribution on V that do not alter 
the induced prior on C. For example, if Vo and Vq are two different covariance matrices with the 
same correlations, then the posterior distribution of C under V ~ inverse-Wishart(i'O) vqVq) will 
be equal to that under V ~ inverse-Wishart^o, ^oVq) . 



4 Income, education and intergenerational mobility 



The U.S. census reports a stron g positive relationship between income and educational attain- 



ment 



Day and Newburgerl . 120021 ] . However, in many studies both of these variables have been 



shown to be associated with a number of family backgro und variables such as parental i ncome 



paren tal educational attainment and number of siblings Ermisch and Francesconil . 



2001 



Blake 



19851 ]. Additionally, some r esearchers have suggeste d that having children reduces opportunities 



for educational attainment 



Moore and Waitei . Il977l ] , while others have found evid ence that eco- 



nomic status of males is positively associated with their fertility Hop croft] . 120061 ] . Results such 
as these are generally based on univariate regression models in which one variable from a sample 
survey is selected as a "response" or "dependent" variable and the others as "control" or "inde- 
pendent" variables. However, all of the variables in these studies are randomly sampled and all are 
potentially dependent on one another. 

In this section we describe the multivariate dependencies among income, education and number 
of children using the Gaussian copula model and the semiparametric estimation procedure described 
in Section 3. Specifically, we analyze survey data on 1002 males in the U.S. labor force (meaning 
not retired, in school or in an institution), obtained from the 1994 General Social Survey. Data 
and details for the survey are available at http://webapp.icpsr.umich.edu/GSS/, 

The relevant variables for this analysis include the income, education, and number of children of 
the survey respondent, as well as similar variables for the respondent's parents. Age of the survey 
respondent is additionally included, as it is typically strongly related to income and number of 
children. The measurement scales for these variables are as follows: 
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Figure 2: MCMC samples of 11 of the correlation coefficients, plotted every 50th scan. 



INC: income of the respondent in 1000s of dollars, binned into 21 ordered categories 

DEG: highest degree ever obtained (None, HS, Associates, Bachelors, Graduate) 

CHILD: number of children ever had 

PINC: financial status of respondent's parents when respondent was 16 (on a 5-point scale) 

PDEG: maximum of mother's and father's highest degree 

PCHILD: number of siblings of the respondent plus one 

AGE: age of the respondent in years 

Missing data rates among each of the non-income variables was less than 4%. The missing data 
rates for INC and PINC were 10% and 48% respectively. However, the question PINC was asked 
on only half of the surveys, and so missing values for this variable can reasonably be considered as 
missing at random. 



4.1 Estimation of C 

Using an inverse- Wishart (p + 2, {p + 2) x I) prior distribution for V, the Gibbs sampling scheme 
outlined in Section 3 was iterated 25,000 times with parameter values saved every 10 scans, resulting 
in 2500 samples of C for posterior analysis. Mixing of the Markov chain was quite good: Figure [2] 
shows MCMC samples of 11 elements of C, corresponding to the odd order statistics of i?[C|Z 6 D]. 
Convergence to stationarity appears to occur quickly, almost certainly within the first 5000 scans. 
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Figure 3: Dependence parameters for the GSS data. The first row gives 2.5%, 50%, 97.5% posterior 
quantiles of the correlation coefficients E[zjZk]- The second row gives the regression coefficients 
VE[ Zj \z^]. 



Dropping these scans to allow for burn-in, we are left with 2000 saved scans for posterior analysis. 
The autocorrelation across these saved scans was low, with the lag-10 autocorrelation less than 
0.05 in absolute value for all elements of C, and much closer to zero for most. Based on the 
autocorrelation in the Markov chain, the effective sample sizes for estimating the posterior means 
of the elements of C were at least 1500. 



4.2 Posterior inference 

Posterior distributions of the correlation parameters are summarized in the first and second rows of 
Figured The first row gives 2.5%, 50% and 97.5% posterior quantiles of the correlation coefficients, 
representing scale-invariant bivariate associations among the six variables of interest. The fact that 
most of these 95% credible intervals do not contain zero indicates that most variables are associated 
with most of the other variables. For example, the results suggest that INC has non-zero positive 
correlations with DEG, CHILD, PINC, PDEG and AGE, and a weak negative correlation with 
PCHILD. DEG shows positive correlations with INC, PINC , PDEG, and negative correlation with 



Blake 



|l985l p. 



PCHILD (in accordance with the conclusion of 

Perhaps of more interest are conditional associations. The second column of Figure gives the 
2.5%, 50% and 97.5% quantiles for the "regression coefficients" Cy^-jC^. for each variable. 
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Figure 4: Reduced conditional dependence graph for the GSS data. 



These coefficients represent conditional dependencies among the underlying processes that give rise 
to the observed data. On this scale, the full conditional distribution of INC depends most strongly 
on DEG, and to a lesser extent on CHILD and AGE. Interestingly, the conditional relationship 
between INC and PINC has a non- negligible (> 5%) probability of being less than or equal to zero. 
Figure H] summarizes these results with a graph indicating the conditional dependencies among the 
z- variables corresponding to the six variables of interest (implicitly conditioning on AGE). An edge 
is present between two nodes if the 95% credible interval for the associated regression parameter 
does not contain zero. This graph suggests that although INC and PINC are positively associated, 
this association is mediated by the intergenerational relationships of DEG, PDEG, CHILD and 
PCHILD. 

4.3 Conditional distributions for the INC, DEG, PINC relationship 

The results in Figure [3] suggest that, although INC and PINC are positively correlated, PINC is a 
relatively weak predictor of INC compared to DEG. However, PINC is a strong predictor of PDEG, 
and PDEG is a strong predictor of DEG, suggesting an indirect effect of PINC on INC. 

These conclusions about INC, DEG and PINC are made in terms of associations among the 
z- variables, although it is often desirable to report results on the scale of the original data. With 
this in mind, we now describe the relationship between INC, DEG and PINC on the original data 
scale, using an estimated predictive distribution Pr(INC, DEG, PINC), which we decompose as 
Pr(INC|DEG,PINC) x Pr(DEG|PINC) x Pr(PINC). 

A predictive distribution for y can be obtained in a few different ways. Perhaps the simplest 
method is to combine the posterior distribution of C with the empirical univariate marginal dis- 
tributions F\ , . . . , F p of the observed data (an alternative method is presented in the Discussion) . 
Using this method, a predictive sample of y can be obtained as follows: 
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1. sample C ~p(C|Z G D); 

2. sample z ~ multivariate normal(0, C); 

3. set yj = F~ 1 {zj). 

Although this somewhat ad-hoc approach disregards uncertainty in the estimation of F± , . . . , F p (for 
prediction of y, not for estimation of C), it provides a predictive joint distribution that matches 
the observed data in terms of the univariate marginal distributions but has a simple, smooth Gaus- 
sian copula representing multivariate dependence. Prom these predictive samples we can obtain 
Monte Carlo estimates of various quantities of interest, including a consistent set of conditional 
distributions on the original scale of the data. 

The first column of Figure [5] plots the predictive distribution of DEG conditional on PINC = x 
for x E {1,2,3,4,5}. As on the z-scale, large values of PINC correspond to large values of 
DEG. The estimated conditional probability of someone not finishing high-school given PINC=5 
is 5%, whereas for PINC=1 it is 22%, giving an odds ratio of odds(DEG=None|PINC=l) / 
odds(DEG=None|PINC=5) = 5.35. Similarly, the corresponding odds ratio for having a grad- 
uate degree is odds(DEG=Grad|PINC=5) / odds(DEG=Grad|PINC=l) = 6.5. For comparison, 
the empirical conditional distributions are provided on the same plot. In general the fit is good, 
with most of the discrepancies occurring in categories of PINC with small sample sizes (n = 28 for 
PINC=1, and n = 8 for PINC=5). Note that if we were to estimate the above odds ratios using 
the empirical conditional distributions we would obtain ratios equal to infinity. In situations such 
as these where the sample size is low, we may prefer to estimate conditional distributions with a 
model that can share information across the categories of a variable, rather than use an empirical 
estimator having a high sampling variability. 

The second column of Figure [5] displays estimated quantiles of Pr(INC|DEG, PINC) for each 
combination of DEG and PINC. Specifically, each row corresponds to a single value of DEG, and 
each boxplot within a row corresponds to a single value of PINC. The boxplot provides 5, 25, 50, 
75 and 95% quantiles of Pr(INC|DEG, PINC). Note that the boxplots within a row indicate very 
small increases in INCOME with increasing values of PINC, while differences across rows indicate 
much larger increases with DEG (changes in the quantiles do not happen continuously due to the 
binned nature of the raw data). For high-school graduates (DEG=1), the estimated conditional 
mean incomes across levels of PINC are {23, 25, 26, 28, 29} in thousands of dollars. For college 
graduates (DEG=2), the estimated means are {41,41,43,44,47}. For these mean calculations, the 
income in a binned income category was taken as the average of the endpoints of the bin. 

For comparison, the actual values of INC for each combination of DEG and PINC are plotted 
on the corresponding boxplots (data are jittered to allow ties to be distinguished). As before, 
the main discrepancies occur for combinations of DEG and PINC for which there are few data. 
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Also, the predictive distributions based on the copula model are much smoother than the empirical 
versions: The empirical conditional means of INC for DEG=1 and DEG=3 are {23, 27, 24, 27, 8} 
and {41,44,35,58,75} respectively, across increasing levels of PINC. However, several of these 
empirical means are calculated from as few as 3 or 4 samples. 



5 Notions of sufficiency 



The extended rank likelihood described above can be viewed as a generalization of margina 



likeli- 



hood, a standard technique for dealing with nuisance parameters (see Section 8.3 of lSeverinil [200C ] 
for a review). One benefit of using such a likelihood is a gain in robustness, as inference no longer 
depends on assumptions about the relationship of the data to the nuisance parameters. Another 
benefit is a general simplification of the estimation problem, as the need to estimate a potentially 
high-dimensional set of parameters is eliminated. These benefits come at the cost of potentially 
losing information about the parameters of interest by only using part of the available data. Ide- 
ally, the statistic that generates the marginal likelihood is "partially sufficient" in the sense that it 
contains all relevant information in the da ta about the parameter of interest. Various definitions 
of partial sufficiency have been developed: 



Fraser 



19561 ] defined S-sufficiency via properties of the 



marginal and con ditional distribu tions of the statistic and the data. The concept of G-sufficiency 
was introduced in [Barnard] 1963 ] as a general principle for making inference about a para meter of 



intere st when the inference problem remains invariant under a group of transformations. 



Remon 



19841 ] developed a generalization of t hese notions based on profile likelihoods called L-sufficiency 



which has been refined and studied by iBarndorff-Nielsenl 198a . 119991 ]. The general recommendation 
of these authors is to base inference for a parameter of interest on the sampling distribution of a 
statistic that is sufficient in some sense. 

If Fi, . . . , F p are all continuous then there are no ties among the data, and knowledge of Z G D 
provides a complete ordering of {y±j, ■ ■ ■ ,Un.j} for each j. This information is equivalent to the 
information contained in the ranks, and so Pr(Z G D\ C) is equivalen t to the sampling distribution 



of the multivariate ranks. Following the nota tion of|Remon 
r(Y) are a G-sufiicient statistic in the sense of [Barnard 



19841 ] we now show that the ranks 



19631 ]: Let C G C describe the copula and 



F = {F\, . . . ,F p } G T the marginal distributions, and so the parameter space is fi = C x T and 
the model space is V = {Pr(-|o;) : co G fi}, where Pr(-|u;) is a probability measure on R p for each 
ijj G fi. Furthermore, let Q be the group of collections of p continuous strictly increasing functions, 
so that Q = {G = (G*i, . . . ,G P ) : Gj is a continuous and strictly increasing function on M.}. To 
each G G G there corresponds a one-to-one function on V mapping P(-\u>) to P(G^ 1 (-)\u>) and the 
model space is closed under the action of Q. As a result, Q induces a group Q = {/q : G G G} on 
n defined by P(-\f G u) = P(G- 1 (-)|w). 
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Figure 5: Empirical and predictive conditional distributions for INC, DEG and PINC. 
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If the marginals are continuous the orbits of Q, under Q can be put into 1-1 correspondence 
with C, and C is therefore a maximal invariant parameter. Barnard defined a statistic t(Y) to be 
G-sufficient if it can be put into 1-1 correspondence with the orbits of MP under Q. This is the case 
for the ranks r(Y) of Y, and so r(Y) is said to be G-sufficient for estimation of C. For continuous 
data, the marginal distribution of the ranks is equal to the extended rank likelihood, and so basing 
inference on this likelihood function can been seen as using all available, relevant information in 
the G-sufficient sense. 

A notion of sufficiency that is more directly related to maximum likelihood estimation is In- 
sufficiency: In the context of copula modeling, a statistic t(Y) is said to be L-sufficient for C 
if 

Al. i(Y ) = i(Yi) su P{Fli ... iFp}e ^p(Y |C,Fi, ... ,F p ) = sup {Flr .. )Fjj}e ^ : p(Yi|C,Fi ! . . .,F p ); 

A2. p(t(Y)\C,F 1 ,...,F p )=p(t(Y)\C). 

Note that the maximum likelihood estimate of C and its distribution will be a function only of 
an L-sufficient statistic, if one exists. If T contains only co ntinuous marg inals , then one can show 



directly that the ranks r(Y) satisfy Al and A2 (alternatively, iRemonl [19841 ] shows that a G-sufficient 
statistic is also L-sufficient). Thus in the continuous case, the ranks are G- and L-sufficient, the 
MLE of C is a function of the ranks alone, and inference for C can be based on the distribution of 
the multivariate ranks, or equivalently, the extended rank likelihood. 

If the marginals are allowed to be discontinuous then the orbits of Q under Q cannot be put 
into 1-1 correspondence with C and so C is not a maximal invariant. The problem is basically 
that if Fj(-) is a discrete CDF, then L 7 [G~ 1 (-)] does not range over the space of all CDF's as G 
ranges over Q. The ranks are no longer L-sufficient either: Condition Al holds but A2 is violated 
because in the discrete case the distribution of the ranks depends on the marginal distributions. 
This means that estimation based on Pr(r(Y)|C, F\, . . . ,F p ) requires estimation of the nuisance 
parameters F%, . . . , F p . This may not be much of an issue if the number of levels of each variable 
is low, but for moderate numbers of levels we may wonder about the variability of the estimates 
due to the large number of parameters, or the need to specify a prior distribution for the marginals 
Fi, . . . , F p in the context of Bayesian estimation. In contrast, the extended rank likelihood based on 
Pr(Z G D\C) does not depend on F\, . . . , F p , thereby reducing the number of parameters to estimate 
and eliminating any need for a prior distribution on Fi, . . . , F p . Furthermore, the extended rank 
likelihood is "sufficient" for continuous data but can be used with mixed continuous and discrete 
data. However, the concern remains that the this likelihood may not be making full use of the 
information in discrete data about the copula parameters of interest. It would be desirable to 
describe precisely any potential information loss that results from using the rank likelihood as 
opposed to a full likelihood approach. Such a description could be obtained by comparing the 
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curvatures of the extended rank likelihood and full likelihood surfaces, although the complicated 
parameter space and likelihood functions make description difficult except for the simplest of cases. 
A general description of the information properties of the rank likelihood in the context of copula 
estimation is a current research interest of the author. 

6 Discussion 

This article has presented an inferential procedure for copula parameters that can be applied to 
mixed continuous and discrete data. The procedure is based on a type of marginal likelihood, 
called an extended rank likelihood, which does not depend on the univariate marginal distributions 
of the data. The procedure therefore allows for the estimation of dependence parameters without 
the burden of having to estimate the marginal distributions. 

The data analyzed in this paper are categorical, although some of the variables have very large 
numbers of categories. An alternative approach to the analysis of categorical data is log-linear 
modeling. For categorical data, a logdinear model can potentially provide a more detailed repre- 
sentation of complex dependencies and interactions than can a Gaussian copula model. However, if 
the number of categories is large and the data are ordinal, a copula model might be more appropri- 
ate. The variables AGE, INC and PCHILD in this article have 60, 21 and 19 categories respectively. 
Stable log-linear analysis of these data would require a coarsening of these and perhaps some of the 
other variables into many fewer categories, resulting in information loss. In contrast, the semipara- 
metric Gaussian copula approach taken here provides a simple dependence model for data having 
arbitrary marginal distributions, discrete or continuous. 

The Gibbs sampling algorithm described in Section 3.2 is quite simple and performs well for 
the data analysis in Section 4. However, the fact that each Zij is being sampled one at a time, and 
from a distribution that is constrained by the values of {zkj '■ k ^ i}, might raise concerns that the 
simple Gibbs sampler might mix poorly in some situations. If poor mixing occurs, one remedy is to 
add Metropolis-Hastings updates that propose simultaneous changes to multiple Zjj's. One such 
procedure that I have implemented is to propose changes to the set {zjj : i = 1, . . . , n) by shuffling 
the distances between the order statistics. In the examples I have tried, this type of procedure has 
given reasonable acceptance rates and has reduced autocorrelation. 

Inference on the scale of the original data can be obtained with a posterior predictive distribution 
based on plugging in the empirical univariate marginal distributions as described in Section 4.3. 
Alternatively, a predictive distribution which accounts for uncertainty in the univariate marginal 
distributions can be derived as follows: The Gibbs sampling scheme of Section 3 can be used to 
generate a joint posterior distribution for zi, . . . ,z n in addition to a new sample z n+ i, for which 
we do not observe y- values. However, if z n+ ij is between two other Zj's having the same yj value, 
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then y n +i,j must equal yj as well since the q ,-'s are non-decreasing . Technically, this produces a 
type of interval probability distribution for y [Weichselbergerl . 1 19951 ] , and for continuous data gives 
univariate marginal predictive probabilities equivalent to the A n procedure of Hillj 19681 ] . For large 
n however, this procedure is essentially equivalent to using the the empirical marginal distributions. 

Although this article has focused on semiparametric estimation of a Gaussian copula, the notion 
of rank likelihood is equally applicable to other copula models: Letting {p(u|0) : 9 € 0} denote a 



parametric family of copula densities and {yij = Gj(ui t 



■3)i ' 



1 . . . , n, j = 1, . . . ,p} be the observed 
data, the extended rank likelihood for 9 is given by Pr(max{u£;j : ykj < yij} < Uij < mm{ukj ■ 
Vi,j < Vk,j}ii = 1) • • • > n -> 3 = 1) ■ ■ ■ ,p\@)- Given a prior distribution on 9, posterior inference can 
be obtained via a Markov chain Monte Carlo algorithm which iteratively resamples values of 9 and 
the Ui j's. However, full conditional distributions for these unknown quantities are generally hard 
to come by, and an MCMC sampler based on the Metropolis-Hastings algorithm is required for 
most models. 

Code to to implement the estimation strategy outlined in Section 3, written in the R statistical 
computing environment, is provided in the Appendix. A more detailed open-source software package 
is downloadable from R-archive at the following website: 



http : / / cran. r-project . org/src/contrib/Descriptions/sbgcop .html 



A R-code for Gaussian copula estimation 

# See also http://cran.r-project.org/src/contrib/Descriptions/sbgcop.html 
# 

# Preconditions: Y, an n-observations by p-variables matrix 

# SO, a p x p prior covariance matrix 

# nO, an integer hyperparameter 

# NSCAN, an integer number of iterations 

########## helper function 

rwish<-f unction(SO ,nu) { # sample from a Wishart distribution 

sSO<-chol(SO) 

Z<-matrix(rnorm(nu*dim(SO) [1]) ,nu,dim(SO) [1] )7,*'/.sS0 

t(zr/.*y.z } 



########## starting values 
n<-dim(Y) [1] 
p<-dim(Y) [2] 
set . seed(l) 

Z<-qnorm (apply (Y,2 .rank, ties . method= "random" ) / (n+1) ) 
Zf ill<-matrix(rnorm(n*p) ,n,p) 
Z[is.na(Y)]<-Zfill[is.na(Y) ] 
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Z<- t( (t(Z)-apply(Z ) 2 ) mean))/apply(Z,2,sd) ) 
S<-cov(Z) 



########## constraints 
R<-NULL 

for(j in l:p) { R<-cbind(R, match (Y [,j] , sort (unique (Y[,j])))) } 

########## start of Gibbs sampling scheme 
for(nscan in 1:NSCAN) { 

#### update Z[,j] 

for(j in sampled :p)) { 

S j c<- S [ j , - j ] '/.♦'/.solve ( S [- j , - j ] ) 

sdj<- sqrt( S[j,j] -S[j ) -j] 1 /.* 1 /.solve(S[-j ) -j]) 1 /.*y.S[-j ) j] ) 
muj<- Z[,-j]7.*7.t(Sjc) 

for(r in sort (unique (R[,j] )) H 

ir<- (l:n) [R[,j]==r k ! is .na(R[, j] )] 
lb<-suppressWarnings(max( Z[ R[, j] <r , j] ,na. rm=T) ) 
ub<-suppressWarnings (min( Z[ R[, j] >r , j] ,na. rm=T) ) 
Z[ir, j] <-qnorm(runif (length(ir) , 

pnorm(lb,muj [ir] ,sdj) ,pnorm(ub,muj [ir] ,sdj)) ,muj [ir] ,sdj) 
} 

ir<-(l:n) [is .na(R[, j] )] 
Z[ir, j] <-rnorm(length(ir) ,muj [ir] ,sdj) 
} 

#### update S 

S<-solve (rwish(solve (SO*nO+t (Z) 7,*7,Z) ,nO+n) ) 
} 

########## end of Gibbs sampling scheme 
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